docker: support for requesting chunks without end offset #2391

giuseppe · 2024-05-02T14:16:31Z

if the specified length for the range is set to -1, then request all the data possible by using the "Range: =-" syntax for the HTTP range.

mtrmac

Thanks!

How this is happening? I can’t see any c/storage caller setting this to -1 .
The Length == MaxUint64 semantics is, AFAICS, not documented, and at least mergeMissingChunks is doing ordinary arithmetic on the values.
If this should exist at all, blobChunkAccessorProxy, splitHTTP200ResponseToPartial, streamChunksFromFile do not handle the special case.

giuseppe · 2024-05-02T14:58:04Z

the error I am seeing is that we pass the value -1 to

How this is happening? I can’t see any c/storage caller setting this to -1 .

the problem is here https://github.com/containers/storage/blob/main/pkg/chunked/storage_linux.go#L1611-L1617

in some cases the blobSize is set to -1 from c/image, e.g. when pulling quay.io/libpod/alpine-with-seccomp:label.

I could change the code to not use any range (if there are no ranges, then request the whole file), but I thought this would be more flexible. We can restrict it so that only the last chunk can have len=-1

mtrmac · 2024-05-02T16:58:08Z

OK, so this is

a schema1 image (unknown layer size)
the transparent conversion path (needs to read the full file)
ImageSourceSeekable not having a “read all” method at all.

Pedantically, when the size is unknown, while GetDiffer requires it (all three implementations currently do, and there is no “unknown” value documented, the locally correct thing to do is for PutBlobPartial to fail without even trying.

If you do want the conversion path to work on these images… yes, I think adding the “length unknown” case to ImageSourceSeekable / BlobChunkAccessor is the simplest approach. (Notably blobChunkAccessorProxy is simpler when there is only one API to call, having to add a whole separate “no-partial download” case to that would not be worth it). So, document the field value, allow it only for the last chunk.

But, at a higher level, this is really another argument to say that the transparent conversion should not be via PutBlobPartial/GetDiffer at all: let it work like an ordinary non-chunked pull, letting c/image deal with the whole blob and digest verification, and then do the conversion all the way down in overlay’s ApplyDiff.

giuseppe · 2024-05-02T19:23:19Z

OK, so this is

a schema1 image (unknown layer size)

the transparent conversion path (needs to read the full file)

ImageSourceSeekable not having a “read all” method at all.

Pedantically, when the size is unknown, while GetDiffer requires it (all three implementations currently do, and there is no “unknown” value documented, the locally correct thing to do is for PutBlobPartial to fail without even trying.

If you do want the conversion path to work on these images… yes, I think adding the “length unknown” case to ImageSourceSeekable / BlobChunkAccessor is the simplest approach. (Notably blobChunkAccessorProxy is simpler when there is only one API to call, having to add a whole separate “no-partial download” case to that would not be worth it). So, document the field value, allow it only for the last chunk.

But, at a higher level, this is really another argument to say that the transparent conversion should not be via PutBlobPartial/GetDiffer at all: let it work like an ordinary non-chunked pull, letting c/image deal with the whole blob and digest verification, and then do the conversion all the way down in overlay’s ApplyDiff.

we need to be able to pull via the PutBlobPartial when convert_images is enabled. composefs needs it to calculate the digests and enable fs-verity for each file.

mtrmac · 2024-05-02T19:38:45Z

As far as c/image is concerned, with UncompressedDigest known, the layer’s singleLayerIDComponent is going to be the DiffID, and the ComposeFS usage is ~invisible to c/image. I naively imagine that whether the call stack says GetDiffer or ApplyDiff does not really matter for fs-verity and other work.

One change that does make a difference, however, is that GetDiffer is parallelized over up to 6 layers concurrently, but layer commit is sequential. That might well be a very strong reason to do the conversion in GetDiffer, not in the commit phase, I didn’t try to measure that.

If this does help, arguably the layer extraction should be similarly made concurrent for non-chunked/non-ComposeFS layers as well. But that redesign would really be a very different conversation (and we might end up deprecating non-ComposeFS layers before that).

giuseppe · 2024-05-03T10:14:16Z

I've tried adding a new method GetBlob to retrieve the entire blob, but I think the current API is more flexible, with a minimal change.

I've updated the comments

mtrmac

Thanks. The existing code LGTM, but other parts need updating as well.

copy/progress_bars.go

pkg/blobcache/src.go

internal/private/private.go

docker/docker_image_src.go

mtrmac · 2024-05-03T18:35:11Z

docker/docker_image_src.go

@@ -255,9 +256,15 @@ func splitHTTP200ResponseToPartial(streams chan io.ReadCloser, errs chan error,
 			}
 			currentOffset += toSkip
 		}
+		var reader io.Reader
+		if c.Length == math.MaxUint64 {
+			reader = body


Non-blocking:

Suggested change

reader = body

reader = body // The caller has ensured that this is the last requested chunk

giuseppe · 2024-05-03T20:17:14Z

thanks, addressed your comments in the last version

rhatdan · 2024-05-13T18:57:20Z

Seems like there is one comment by @mtrmac that was not addressed?

giuseppe · 2024-05-13T19:26:38Z

Seems like there is one comment by @mtrmac that was not addressed?

I thought I've addressed them all, which one is still missing?

rhatdan · 2024-05-13T20:55:59Z

https://github.com/containers/image/pull/2391/files#r1589601556

if the specified length for the range is set to -1, then request all the data possible by using the "Range: <unit>=<range-start>-" syntax for the HTTP range. Signed-off-by: Giuseppe Scrivano <[email protected]>

giuseppe · 2024-05-14T08:37:32Z

https://github.com/containers/image/pull/2391/files#r1589601556

Thanks. Fixed now

mtrmac

Thanks!

giuseppe mentioned this pull request May 2, 2024

overlay: lock staging directories containers/storage#1916

Merged

mtrmac requested changes May 2, 2024

View reviewed changes

giuseppe force-pushed the docker-image-support-missing-end-range branch 2 times, most recently from 679afe4 to e09da45 Compare May 3, 2024 09:14

mtrmac reviewed May 3, 2024

View reviewed changes

giuseppe force-pushed the docker-image-support-missing-end-range branch from e09da45 to 098926a Compare May 3, 2024 20:05

mtrmac added the kind/feature A request for, or a PR adding, new functionality label May 13, 2024

docker: support for requesting chunks without end offset

b47707f

if the specified length for the range is set to -1, then request all the data possible by using the "Range: <unit>=<range-start>-" syntax for the HTTP range. Signed-off-by: Giuseppe Scrivano <[email protected]>

giuseppe force-pushed the docker-image-support-missing-end-range branch from 098926a to b47707f Compare May 14, 2024 08:37

mtrmac approved these changes May 14, 2024

View reviewed changes

mtrmac merged commit 984c624 into containers:main May 14, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docker: support for requesting chunks without end offset #2391

docker: support for requesting chunks without end offset #2391

giuseppe commented May 2, 2024

mtrmac left a comment

giuseppe commented May 2, 2024

mtrmac commented May 2, 2024

giuseppe commented May 2, 2024

mtrmac commented May 2, 2024

giuseppe commented May 3, 2024

mtrmac left a comment

mtrmac May 3, 2024

giuseppe commented May 3, 2024

rhatdan commented May 13, 2024

giuseppe commented May 13, 2024

rhatdan commented May 13, 2024

giuseppe commented May 14, 2024

mtrmac left a comment

	reader = body
	reader = body // The caller has ensured that this is the last requested chunk

docker: support for requesting chunks without end offset #2391

docker: support for requesting chunks without end offset #2391

Conversation

giuseppe commented May 2, 2024

mtrmac left a comment

Choose a reason for hiding this comment

giuseppe commented May 2, 2024

mtrmac commented May 2, 2024

giuseppe commented May 2, 2024

mtrmac commented May 2, 2024

giuseppe commented May 3, 2024

mtrmac left a comment

Choose a reason for hiding this comment

mtrmac May 3, 2024

Choose a reason for hiding this comment

giuseppe commented May 3, 2024

rhatdan commented May 13, 2024

giuseppe commented May 13, 2024

rhatdan commented May 13, 2024

giuseppe commented May 14, 2024

mtrmac left a comment

Choose a reason for hiding this comment